Object Persistence and Availability in Digital Libraries

نویسندگان

  • Michael L. Nelson
  • B. Danette Allen
چکیده

We have studied object persistence and availability of 1,000 digital library (DL) objects. Twenty World Wide Web accessible DLs were chosen and from each DL, 50 objects were chosen at random. A script checked the availability of each object three times a week for just over 1 year for a total of 161 data samples. During this time span, we found 31 objects (3% of the total) that appear to no longer be available: 24 from PubMed Central, 5 from IDEAS, 1 from CogPrints, and 1 from ETD. 1.0 Introduction We have measured the persistence and availability of digital objects in a variety of publicly available World Wide Web (WWW) digital libraries (DLs). We selected twenty DLs, and from those DLs selected 50 information objects, for a total of 1,000 information objects. Three times a week (Sunday, Tuesday and Friday early morning, Eastern Time zone) from November 21, 2000, through December 9, 2001, scripts were run that download the information object and record the number of bytes successfully transmitted. All tests were run from the machine ruby.ils.unc.edu (152.2.81.1; Solaris 2.7). When the terms "unavailable" and "inaccessible" are used below, it is important to remember that these terms are with respect to ruby.ils.unc.edu. We are particularly interested in the long-term availability of individual objects in the DL. Similar previous studies have looked at the availability of general http servers (Viles & French, 1995), or at the availability of general web pages (Douglis et al., 1997; Koehler, 1999; Koehler, 2000). One study looked at the availability of DL services that were layered on top of http (Powell & French, 2000). We find similar network and http server availability issues that these studies report. However, for evaluating the availability of individual objects, these studies are insufficient. Certainly, if the http server, DL service, or "entry" web page is missing, then the desired object will be unreachable. However, simply verifying the operation of an http server, DL service or web page does not imply that the actual object itself is still available or unchanged. On the assumption that being placed in a DL is indicative of someone's desire to increase the persistence and availability of an object, we expect DL objects to survive longer, change less, and be more available than general WWW content. Of the objects that appeared to be unavailable from the test data, we manually sought out the objects in the tested DL, and in the case of distributed DLs, we went to the original web sites. After doing Object Persistence and Availability in Digital Libraries http://www.dlib.org/dlib/january02/nelson/01nelson.html[5/3/2016 2:54:10 PM] so, we consider 31 of the 1,000 objects to be currently unavailable. It is possible that these objects are available in some other form, or by another name, and we were simply unable to find them. However, these 31 objects represent results returned in searching and browsing of the DLs in November 2000 that we cannot currently find. This percentage is similar to what Lawrence et al. (2001) report in their study of the persistence of URLs that appear in technical papers. For recently authored papers, they found over 20% of the URLs listed were invalid. However, manual searches were able to reduce the number of "lost" URLs to 3%. Since not all the URLs extracted for their study were necessarily objects in DLs, we are unsure if 3% lossage is generalizable. 2.0 Test Collection The twenty DLs we studied are listed in Table 1. We selected these DLs for their mixture of coverage, popularity, geographic locations, and data storage architectures. Some of the DLs have their own "brand" or "identity", some are part of an institution's library services, and others are simply report listings on a web page. However, we considered them to be DLs regardless of their own awareness or promotion of themselves as DLs. "Contributor" describes from where the DL's content comes, and "Access" indicates if the URLs for the objects represent direct access to the delivery format (i.e., PDF, PostScript) or if access goes through a "wrapper" service that delivers the content through a server-side process in addition to the http server. The appendix contains a tar file of all the objects, scripts used to test the objects, the test data, and the Matlab scripts used to generate the visualization of the data. DL Root Content Data Storage Contributor Access w.harvard.edu Astrophysics Centralized & distributed Publishers, Societies, Universities Wrapper ccdb.kek.jp Astrophysics Centralized Universities, Laboratories Wrapper citeseer.nj.nec.com Computer science Centralized & distributed Scraped From Web Pages Wrapper cogprints.soton.ac.uk Cognitive science Centralized Individual Authors Direct data.mpi-sb.mpg.de Mathematics, computer science Centralized MPI Wrapper ideas.uqam.ca Economics Distributed Individual authors Direct mtrs.msfc.nasa.gov Aerospace Centralized MSFC Direct pubmedcentral.nih.gov Medical science Centralized Publishers Wrapper stinet.dtic.mil General science Centralized Universities, Laboratories Wrapper www.arxiv.org Physics, mathematics, computer science Centralized Individual authors Wrapper www.inria.fr Computer science Centralized INRIA Direct www.jlab.org Physics Centralized JLab Direct www.ncstrl.org Computer science Distributed Universities Wrapper www.nzdl.org Computer science Centralized & distributed Universities Wrapper Object Persistence and Availability in Digital Libraries http://www.dlib.org/dlib/january02/nelson/01nelson.html[5/3/2016 2:54:10 PM] www.onera.fr Aerospace Centralized ONERA Direct www.rand.org Economics, mathematics, social sciences Centralized RAND Direct www.research.digital.com Computer science Centralized COMPAQ/DEC Direct www.santafe.edu Mathematics, economics Centralized SFI Direct www.slac.stanford.edu Physics Centralized Universities, Laboratories Direct www.theses.org General science Centralized Universities Direct

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی میزان رعایت معیارهای مدیریت دانش در وب‌سایت‏‌های کتابخانه‏‌های دیجیتالی منتخب در ایران

Background and Aim: Considering the elements of knowledge management (availability, creation, and transfer of knowledge) is very important in digital libraries websites and makes the performance better. So this paper aim to identify the knowledge management criteria in Iranian selected digital library's websites and study of observance scale Materials and Methods: The research method was des...

متن کامل

شاخص های طراحی و ارزیابی کتابخانه های دیجیتالی

Introduction: There was always suspicion regarding concept and frameworks of digital libraries concepts such as electronic library, virtual library, without wall library, hybrid library and digital library have applied often together, or for each other for conveying library concept. Studies have shown that so far there is no standard and universal accepted definition for digital libraries, howe...

متن کامل

Context-aware systems: concept, functions and applications in digital libraries

Background and Aim Among the places that context-aware systems and services would be very useful, are libraries. The purpose of this study is to achieve a coherent definition of context aware systems and applications, especially in digital libraries. Method: This was a review article that was conducted by using Library method by searching articles and e-books on websites and databases. Results:...

متن کامل

Proposed content framework for digital literacy education to users in Iran

Aim: today, digital literacy, as a set of skills that enable people to use digital space effectively for success in personal, educational and professional life, has become a necessity in all societies and public libraries are one of the most important providers of digital literacy education in the world. Digital literacy education has not been considered in public libraries in Iran. The first s...

متن کامل

A Systematic Review of Data Mining Applications in Digital Libraries

Purpose: Study aimed to identify the applications of data mining in the provision of services, collection and management of digital libraries. Methodology: This is an applied study in terms of purpose and in terms of method is qualitative research that have been done by systematic review method. For this purpose, articles have been obtained by searching databases of Springer, Emerald, ProQuest,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • D-Lib Magazine

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2002